Skip to content

README rewrite (c't style) + COMPARISON.md + AMX SIGILL fix#103

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-rust-smart-home-SOPAY
Apr 13, 2026
Merged

README rewrite (c't style) + COMPARISON.md + AMX SIGILL fix#103
AdaWorldAPI merged 1 commit into
masterfrom
claude/setup-rust-smart-home-SOPAY

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

  • README EN+DE rewritten in c't magazine style (technical depth, pyramid structure, methodology transparency)
  • COMPARISON.md: complete feature inventory (80K LOC, 146 HPC modules, 179 files)
  • 3-level cascade section explaining how palette cosine replaces FP32 dot product
  • Cosine vs GPU table with Sapphire Rapids, i7-11th gen, Pi 4, Pi Zero 2W, RTX 3060, H100
  • Cherry-picked AMX SIGILL fix (282daf7): _xgetbv(0) + prctl instead of CPUID leaf 0xD

AMX SIGILL fix detail

Root cause: amx_available() checked CPUID (what CPU supports) not _xgetbv(0) (what OS enabled). Hypervisors advertising AMX without enabling tile state caused SIGILL on LDTILECFG.

Fix: 4-step detection (CPUID → OSXSAVE → _xgetbv(0) → prctl). Before: SIGILL crash. After: 1612 pass, 0 fail.

Test plan

  • AMX detection tests pass (6/6 simd_amx tests)
  • No code changes to SIMD/HPC modules (documentation + AMX fix only)
  • README renders correctly on GitHub

https://claude.ai/code/session_017ZN5PNEf8boFBgorUZVrFU

…ection

Root cause: amx_available() used CPUID leaf 0xD (what CPU supports)
instead of _xgetbv(0) (what OS actually enabled). Hypervisors that
advertise AMX in CPUID but don't enable tile state caused SIGILL.

Fix — 4-step detection:
  1. CPUID.07H bits 24+25 → CPU has AMX-TILE + AMX-INT8?
  2. CPUID.01H bit 27     → OS supports XSAVE?
  3. _xgetbv(0) bits 17+18 → OS ACTUALLY enabled tile state?
  4. prctl(ARCH_REQ_XCOMP_PERM, 18) → process has tile permission?

VNNI hierarchy documented:
  avx512vnni (EVEX zmm, 64 MACs) → checked first
  avxvnniint8 (VEX ymm, 32 MACs) → only if avx512vnni absent

Before: 1612 pass + SIGILL crash
After:  1612 pass, 0 fail, 36 ignored

Cherry-picked from 282daf7 (claude/continue-lance-graph-ndarray-Ld786)

https://claude.ai/code/session_017ZN5PNEf8boFBgorUZVrFU
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f68fefa514

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/simd_amx.rs
// multiple times.
#[cfg(target_os = "linux")]
{
const SYS_PRCTL: i64 = 157; // x86_64 syscall number for prctl
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Call arch_prctl when requesting XCOMP permission

On Linux x86_64, ARCH_REQ_XCOMP_PERM is an arch_prctl operation, but this code issues syscall 157 (prctl). That call returns EINVAL for option 0x1023, so amx_available() will return false whenever this branch runs, disabling AMX even on hosts where AMX is actually usable. The request should be made via arch_prctl (syscall 158) with the same option/feature arguments.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 00a3c16 into master Apr 13, 2026
6 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants